Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
translated by 谷歌翻译
Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.
translated by 谷歌翻译
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
translated by 谷歌翻译
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io
translated by 谷歌翻译
Digital Twin is an emerging technology that replicates real-world entities into a digital space. It has attracted increasing attention in the transportation field and many researchers are exploring its future applications in the development of Intelligent Transportation System (ITS) technologies. Connected vehicles (CVs) and pedestrians are among the major traffic participants in ITS. However, the usage of Digital Twin in research involving both CV and pedestrian remains largely unexplored. In this study, a Digital Twin framework for CV and pedestrian in-the-loop simulation is proposed. The proposed framework consists of the physical world, the digital world, and data transmission in between. The features for the entities (CV and pedestrian) that need digital twined are divided into external state and internal state, and the attributes in each state are described. We also demonstrate a sample architecture under the proposed Digital Twin framework, which is based on Carla-Sumo Co-simulation and Cave automatic virtual environment (CAVE). The proposed framework is expected to provide guidance to the future Digital Twin research, and the architecture we build can serve as the testbed for further research and development of ITS applications on CV and pedestrian.
translated by 谷歌翻译
Image analysis technologies empowered by artificial intelligence (AI) have proved images and videos to be an opportune source of data to learn about humpback whale (Megaptera novaeangliae) population sizes and dynamics. With the advent of social media, platforms such as YouTube present an abundance of video data across spatiotemporal contexts documenting humpback whale encounters from users worldwide. In our work, we focus on automating the classification of YouTube videos as relevant or irrelevant based on whether they document a true humpback whale encounter or not via deep learning. We use a CNN-RNN architecture pretrained on the ImageNet dataset for classification of YouTube videos as relevant or irrelevant. We achieve an average 85.7% accuracy, and 84.7% (irrelevant)/ 86.6% (relevant) F1 scores using five-fold cross validation for evaluation on the dataset. We show that deep learning can be used as a time-efficient step to make social media a viable source of image and video data for biodiversity assessments.
translated by 谷歌翻译
Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.
translated by 谷歌翻译
本文介绍了经典懒惰的概率路线图算法(Lazy PRM)的修订,该算法是由配对PRM和一种新颖的分支和切割(BC)算法产生的。切割是动态生成的约束,这些约束在PRM选择的几何图上施加的最低成本路径。削减消除无法映射到满足适当定义运动学约束的平滑计划中的路径。我们通过在最低成本路径中将花键拟合到顶点来生成候选平滑计划。使用最近提出的算法对计划进行了验证,该算法将它们映射到有限的痕迹中,而无需选择固定的离散步骤。痕量元素准确地描述了计划交叉约束边界何时模拟算术精度。我们使用我们最近提出的谷仓基准的方法评估了几个计划者,我们报告了方法可扩展性的证据。
translated by 谷歌翻译
人类仍在执行许多高精度(DIS)任务,而这是自动化的理想机会。本文提供了一个框架,该框架使非专家的人类操作员能够教机器人手臂执行复杂的精确任务。该框架使用可变的笛卡尔阻抗控制器来执行从动力学人类示范中学到的轨迹。可以给出反馈以进行交互重塑或加快原始演示。董事会本地化是通过对任务委员会位置的视觉估算来完成的,并通过触觉反馈进行了完善。我们的框架在机器人基准拆卸挑战上进行了测试,该机器人必须执行复杂的精确任务,例如关键插入。结果显示每个操纵子任务的成功率很高,包括盒子中新型姿势的情况。还进行了消融研究以评估框架的组成部分。
translated by 谷歌翻译
尽管进行了数十年的研究,但现有的导航系统在野外部署时仍然面临现实世界中的挑战,例如在混乱的家庭环境或人类占领的公共场所中。为了解决这个问题,我们提出了一类新的隐式控制政策,将模仿学习的好处与模型预测控制(MPC)的系统约束的强大处理结合在一起。我们的方法称为Performer-MPC,使用了通过表演者提供的视觉上下文嵌入的学习成本函数(一种低级隐式意见变压器)。我们共同训练成本函数并构建依靠它的控制器,有效地端到端解决相应的双层优化问题。我们表明,由此产生的策略通过利用一些在不同挑战的现实世界情景中利用一些专家演示来提高标准MPC绩效。与标准的MPC政策相比,表演者MPC在混乱的环境中实现了40%的目标,而在人类浏览时,社交指标的目标> 65%。
translated by 谷歌翻译